Quantifying hierarchy and prestige in US ballet academies as social predictors of career success

In the recent decade, we have seen major progress in quantifying the behaviors and the impact of scientists, resulting in a quantitative toolset capable of monitoring and predicting the career patterns of the profession. It is unclear, however, if this toolset applies to other creative domains beyond the sciences. In particular, while performance in the arts has long been difficult to quantify objectively, research suggests that professional networks and prestige of affiliations play a similar role to those observed in science, hence they can reveal patterns underlying successful careers. To test this hypothesis, here we focus on ballet, as it allows us to investigate in a quantitative fashion the interplay of individual performance, institutional prestige, and network effects. We analyze data on competition outcomes from 6363 ballet students affiliated with 1603 schools in the United States, who participated in the Youth America Grand Prix (YAGP) between 2000 and 2021. Through multiple logit models and matching experiments, we provide evidence that schools’ strategic network position bridging between communities captures social prestige and predicts the placement of students into jobs in ballet companies. This work reveals the importance of institutional prestige on career success in ballet and showcases the potential of network science approaches to provide quantitative viewpoints for the professional development of careers beyond science.


S1 Youth America Grand Prix
The Youth America Grand Prix (YAGP) is a highly influential ballet competition that aims to discover and reward young dancers.This paid event provides education and job opportunities to dance students in pre-professional training, and is seen as an e↵ective way to promote dancers on the path to successful careers.The competition is held in two stages, with multiple regional semi-finals and one final competition each year.Participants are divided into age categories: Pre-Competitive (9-11 years old), Junior (12-14 years old), and Senior (15-19 years old; 20 years old after the COVID-19 pandemic in 2020).
The competition recognizes dancers with two types of awards.The first are the competition medals: gold, silver, and bronze, which are given based on aggregate scores across various judges and dance categories.The hierarchical top three positioning is based on a 1-100 scoring system where the competition jury evaluates technical and artistic elements of performance.The medals (gold, silver, bronze) are given to the highest scores in each competition group formed by Division and Category in each venue.The second is the Grand Prix, which is awarded based on the subjective appreciation of exceptional performance by a committee of judges without explicit criteria.The Grand Prix is not always awarded, thus is considered the highest distinction of the competition.
Both awards can be given during the semi-finals or finals; the medals may reflect ties, while the Grand Prix is only awarded to one student per division, or may not be given at all.The YAGP publicly reports the medal and Grand Prix winners, and overall top students (up to top 12) by year and location.As many YAGP competitors are awarded scholarships and professional contracts in ballet companies, the YAGP also reports on the successes of its alumni, including their pre-professional a liations and current job placements.

S2 Ballet Schools
We define a 'ballet school' as any organization that o↵ers ballet training.In the US, the structure of ballet schools vary in their organization.There are both university and non-university programs.For instance, the Higher Education Arts Data Services 2020-2021 report lists 76 universities a liated to the National Association of Schools of Dance.For the context of this research, we must consider that most dancers competing at YAGP would engage in pre-professional ballet training in non-university institutions because of the competitive age divisions, ranging from 10 to 19 years old.

S2.1 Distribution of Ballet Schools across the U.S.
We explore the geographic location of schools to map the distribution of ballet academies across the US.To do this, we perform a Google search of all schools' addresses to find their georeference.Simultaneously, we confirm the formation of communities in the network of ballet schools using the Louvain algorithm [1], as this helps understand whether schools tend to compete in clusters.We find that the communities detected with the algorithm and their geographical regions strongly overlap.We show the resulting network in Figure S1 with colors assigned based on the community detected and node's position based on actual US regions, as: Northeast, East, Midwest, South, and West.

S2.2 Schools' achievement
Achievement, measured by the number of awards received per school, is an important factor influencing prestige because it can capture the level of social recognition that one institution has for its outstanding performance.Here, we explore schools' achievement in the YAGP competition and the existing patterns in the distribution of awards, to test the relationship with network centrality.This analysis provides insights on whether social prestige is rooted in achievement, or it is also related to the richness of professional connections.
From the YAGP's structured data, we quantify schools' achievement as a proxy derived from the number of awards received during their participation in the competition.We compute the following metrics: i. School's total number of awards, A k : the sum of awards a from the total competitions J (semi-finals and finals), that each student i from the set of students I obtains per competition j when they are a liated to school k.This can be noted as , where A ij is the individual sum of awards per competition, ✓ ijk = 1 is given if the student i is a liated to school k for the competition j, and ✓ ijk = 0 otherwise.
ii. Ratio of awards per school, R k : the ratio of obtained awards derived from the number of competitions where a liated students ranked as top 12.The ratio of awards for schools is T k where T k is the total top students a liated to school k, given by T k = P I i=1 ✓ ik for ✓ ik = 1 if a student is a liated to k, and ✓ ik = 0 otherwise; then R k gives the number of awards per student a liated to each school.Here, R = 0 indicates that no awards were obtained, R = (0, 1) indicates less than one award per competition, R = 1 gives an even relation between the number of awards and competition, and R > 1 for those with more than one award per competition in top 12.We label each group as 'no medals', 'underachiever', 'break-even', and 'high-achiever', respectively for each ratio group mentioned above.
Figure S2A shows the total number of awards per school, from which we observe a fat-tail distribution, indicating that most schools who have won awards have obtained only one or two awards, while a few schools collected more than 100 awards.To better understand the relationship between the total number of awards (A k ) and the total number of students (T k ) per school k, we compute the ratio of awards as R k = A k /T k .From this ratio, four groups of schools are observed (see distribution of in Figure S2B).We find that about 75% of schools have a low ratio of awards.For instance, 692 schools (43.2% of all) did not receive any awards (R k = 0), even though they had top students listed by the YAGP; and 519 schools (32.4%) are 'under-achievers', with less than one award per top student (R k = (0, 1), in blue).Conversely, only 227 schools (14.1%) have a 'break-even' ratio (R k = 1, in yellow), indicating one award per top student; and only 10.3% of schools (165 schools) are 'high-achievers' (R k > 1, in red), meaning that their students obtain more than one award.Noteworthy, the 'high-achiever' group ranges from very low to a higher number of students, indicating that a high ratio of awards is independent of the frequency of schools' participation in the competition.
Next, we examine the relationship between schools' ratio of awards and betweenness centrality.In Figure S2C, we observe that most schools are located in the lower quartile of betweeenness Taken together, these findings help contextualize the existing relationship between achievement and its variation by schools' network position.

S2.3 Ranking of Ballet Schools
To rank schools by their social prestige, we first evaluate di↵erent network centralities and schools' achievement to test what measure captures prestige more accurately.Then, we validate our ranking method by comparing our metrics with an external selection of prestigious schools from dance experts.The network centralities are listed in Table S1 and were selected based on the reported association of network position and social prestige [2].Achievement is measured by the number of awards received per school (see SI S3.1), an important factor influencing prestige because it can capture the level of social recognition that one institution has for its outstanding performance [3,4].To control the e↵ect of the number of awards (A k ) by the number of students (T k , a proxy of schools' size), we compute the ratio of awards as R k = A k /T k and include it as a metric of achievement.
Table S1: Centrality metrics to assess the prestige of ballet schools.Formula in general notation as described in the specified citation.

Network measure Description Formula
Betweenness how much a node connects two other nodes Closeness a node's proximity to other nodes Ci = Degree a node's connectivity Eigenvector a node's influence Separately, we collect an external selection of prestigious schools by ballet experts from different sources.First, Dance Magazine, a leading multimedia platform in the dance world, that also partners with other dance sources in multiple publications (e.g.Pointe, Dance Spirit, and Dance Teacher).We use their selection of top pre-professional academies for the academic year 2022-2023 [9].Separately, we collect the list of top ballet schools in the US and the 2023 Summer Intensive Guide from the blog A Ballet Education, one of the most reliable online ballet experts for teachers, students, ballet professionals, and ballet lovers [10].In total, the list of top schools formed from these sources, contains 60 schools' names, shown in Figure S3.
We implement a classification model using a binary system for the list of Top Ballet Schools (1:top school, 0:no top school) to test the schools' centrality and achievement measures' accuracy to capture social prestige.Considering each metric in separate models, we measure the area under the cure (AUC) of the classification model (Receiver Operating Characteristic Curve, ROC) using the pROC package for R [11].The AUC is a score between 0 and 1, where a value closer to 1 indicates a probability of correct classification, while a score lower than 0.5 indicates that the model performs no better than random guessing.AUC and ROC for each centrality are shown in Figure S4.The AUC=0.75 suggests that betweenness centrality (Fig. S4A) is the most accurate centrality measure capturing social prestige, respect to the other centrality metrics (Fig. S4B-D).On the other hand, when we explore the ratio of awards (number of total awards controlled by school size) we observe the lowest accuracy in predicting schools' prestige (Fig. S4F), with an AUC=0.71.
Further, we order the schools by their betweenness centrality and create a ranking list.In this rank, a school k with r k = 1 has the largest centrality value, i.e. is more central or prestigious, and the largest rank value (e.g.r k = 945) has the lowest centrality in the set of schools K. Based on the number of observations per school, we select the upper 5% of schools from the network-based ranking.The list of schools is shown in Figure S5.We use this selection for the further treatment e↵ect of being in a top school on job placement, notated as Y i = 1 for students a liated to a prestigious school, and Y i = 0 for students who attended a less prestigious school.

S3.1 Students' achievement
We quantify students' performance in the competition by means of the number of awards and the times they are listed as top students by the YAGP.We use these measures as a proxy of performance in ballet.We compute the following metrics: i. Student's total competitions, T i : the number of semi-final competitions as a top student.
Multiple participation in the competition per year are possible, thus T i denotes only competitions in the semi-finals.
ii. Student's total number of awards, A i : the sum of awards a each student i obtains per competition j, considering the medals earned in each category (c = classical, m = contemporary), and the Grand Prix (g).Then, A i = A ic +A im +A ig , where for A ic = P T i j=1 a ij , a ij = 1 when a dancer was awarded a medal in the classical category for the competitions j up to T i for all semi-final competitions in C, and a ij = 0 meaning that the student did not obtain a medal.The notation is the same for all possible awards in c, m, and g.One student cannot win more than one medal per category in each competition, then the possible number of awards in a given competition would be A ij = {0, 1, 2, 3}.In a similar fashion, competition medals per student are counted by type (gold, silver, bronze).
iii.Ratio of awards per student, R i : the ratio of obtained awards derived from the number of competitors ranked as top 12.The ratio of awards for students is given by and averages the awards obtained per student in each competition.Similarly to schools' ratio of awards, R = 0 indicates that no awards were obtained, R = (0, 1) indicates less than one award per student, R = 1 gives an even relation between the number of awards and student (e.g. one student won one award every time he was listed as top 12), and R > 1 for those with more than one award per student in top 12.We label each group as 'no medals', 'under-achiever', 'break-even', and 'high-achiever', respectively for each ratio group mentioned above.
In Figure S6A, we observe the power-law distribution of students' number of awards, A i .This indicates that only a few students win multiple awards, while the majority can achieve at least one medal.This distribution of awards is consistent with the one observed in the number of awards per school (Figure S2A).Most students (75%, 4824 students) have a success ratio lower than one, indicating that they win less than one award per competition.From this fraction, 3642 students (57% of all) obtained 'no medals', R i = 0, even though they were listed in the top 12. Figure S6B shows that, from those who obtained at least one medal, a 18.5% (1182 students, in blue) are 'under-achievers', R i = (0, 1), meaning that they obtained less awards than the number of times they were listed in the top 12 (A i < T i ).Conversely, a very small fraction of students (4.8%, 309 students, in red) are 'high-achievers', characterized by a high success ratio, R i > 1, with more awards than the number of times they were listed in the top 12 (A i > T i ).Only 19.7% of students (1260 students, in yellow) obtained a 'break-even' success ratio (R i = 1).One would expect that the number of awards per student (A i ) strongly correlates with the total times in the top 12 (T i ), indicating that awards accumulate as students participate in the competition.Such relationship is moderate (Pearson's corr.coe cient=0.68),suggesting that being listed in the top 12 multiple times is not a strong but a mild predictor of the total number of awards.The majority of students (3511 students, 55%) have only one reported competition in the top 12 (T i = 1), but we see in Figure S6D that all groups by ratio of awards have an exponential distribution respect to T i .This indicates that there is a general pattern of variability in T i that is independent of students' success ratio, S i and also illustrates that the awards are actually scarce, even among top contenders.
To explore the distribution of awards, we compute the observed probability of obtaining an award, P (⇤), where ⇤ represents each award by their implied rank.Such implied rank ⇤ is for semi-finals (SF): no medals, Bronze, Silver, Gold, Grand Prix; and for the finals (F): no medals, Bronze, Silver, Gold, Grand Prix (this rank does not assume the award value given by individuals).We divide the total number of awards per rank by the total number of top 12 students in the competition pool.Figure S6C shows that YAGP competitors have a 0.62 probability of being positioned in the top 12 without winning an award, while winning a competition medal in the semi-finals has about 0.1 of probability for gold, silver, and bronze.Interestingly, the probability of winning the Grand Prix in the semi-finals is rather low (0.03) but higher than being in the top 12 of the finals without obtaining an award (0.017 probability).Winning any competition medal in the YAGP finals has a very small probability (P r(⇤ = {F: bronze, silver, gold}) < 0.005), and the probability of winning the Grand Prix at this competition level is even smaller (0.002).Taking all together, our findings draw a picture of the level of competitiveness that YAGP competitors face and shows how repeated competitions do not imply a higher success ratio, even among top contenders, and raise the question about what factors di↵erent from persistence -and accumulated practice/experience-can determine dancers to be recognized by the jury in the YAGP competition venues.
under−achiever break−even high−achiever no medals under−achiever break−even high−achiever A shows the count of the number number of awards per student A i , with a power-law distribution.This indicates that most students obtained only one award, while only a few obtained more than 10 awards.B shows the percentage of students by group of success ratio, R i , obtained from dividing the number of awards by the number years they positioned as top student.Taken together, 'no medals' (R i = 0, in gray) and 'under-achievers' (R i = (0, 1), in cyan) represent the 75.3% of the total number of competitors (6393 students), meaning that most competitors win less than one award per top positioning.About 19.7% of students (1260 students) wins one award per top positioning (R i = 1, in yellow), while only 4.8% of students (309 students) obtains more than one award per time they position as top student.C shows the probability of obtaining awards by rank for the finals/semi-finals in orange/blue.We see that a most top students do not obtain an award, and the probability of being awarded decreases as the rank increases, emphasizing scarcity in high rank awards (i.e.Finals).D shows the count of students and their number of competitions, T i by their ratio of awards group, R i , and demonstrates that all groups display an exponential distribution, suggesting that there is a general pattern of variability in the number of awards obtained per student disregarding of the number of semi-finals competitions.

S3.2 Students' success
Table S2 shows four variations of our base model.Model 1 corresponds to model described in Eq. 2. In model 2, we test for the potential e↵ect of advancing to the competition finals by adding a dummy variable for being a finalist (F i = 1) or not (F i = 0).Models 1 and 2 are discussed in the results section.
In model 3, we examine the role of an a liation for a job placement.Here, we replace the measure of school's prestige with a dummy variable for being a liated (D i = 1) or being an independent competitor (D i = 0).This model shows no statistical e↵ect for attending a school versus being an independent competitor, suggesting that there is no statistical di↵erence of being a liated or not regarding the chances of obtaining a job placement.Separately, in model 4 we explore the e↵ect of total competition medals and observe that winning any medal has a significant but small positive e↵ect on the chances of obtaining a job placement, increasing only by 49%.
In addition, we check for robustness of our treatment e↵ect analysis on being a liated to a top school, and observe that when we use the top 10% of schools (148 schools, 4217 controls, 3501 treated), the chances of job placement increase by 43% (p = 0.0015).
In sum, our analyses show that school prestige has a robust and significant positive e↵ect on dancers' job placement, a comparable e↵ect respect to being a finalist, but more importantly, that even when performance is similar, there are social factors such as prestige driving the selection of dancers towards successful company positions.

Figure S1 :
Figure S1: Network of ballet schools in the US.Ballet schools are nodes, and two schools are linked by winning in the co-competition of the YAGP.Node size by schools' betweenness centrality, B k .Schools' location corresponds to their actual geographical location.Nodes are colored by detected communities: Northeast (purple), East (blue), West (orange), South (green), and Midwest (yellow).Edge color/gray for intra/inter community links.

Figure S3 :
Figure S3: List of top schools.Selection of top schools by US experts from Dance Magazine and A Ballet Education.Schools listed in no particular order.

Figure S4 :
Figure S4: AUC for network centrality and ratio of awards.Panel A shows that betweenness centrality is the best indicator of prestige with the largest AUC=0.75 among all centralities in panels B-D and schools' achievement, in panel E.

Figure S5 :
Figure S5: Ranking of ballet schools.The list contain the top 5% most prestigious U.S. ballet academies from the network-based ranking.

Figure S6 :
FigureS6: Awards of ballet students.A shows the count of the number number of awards per student A i , with a power-law distribution.This indicates that most students obtained only one award, while only a few obtained more than 10 awards.B shows the percentage of students by group of success ratio, R i , obtained from dividing the number of awards by the number years they positioned as top student.Taken together, 'no medals' (R i = 0, in gray) and 'under-achievers' (R i = (0, 1), in cyan) represent the 75.3% of the total number of competitors (6393 students), meaning that most competitors win less than one award per top positioning.About 19.7% of students (1260 students) wins one award per top positioning (R i = 1, in yellow), while only 4.8% of students (309 students) obtains more than one award per time they position as top student.C shows the probability of obtaining awards by rank for the finals/semi-finals in orange/blue.We see that a most top students do not obtain an award, and the probability of being awarded decreases as the rank increases, emphasizing scarcity in high rank awards (i.e.Finals).D shows the count of students and their number of competitions, T i by their ratio of awards group, R i , and demonstrates that all groups display an exponential distribution, suggesting that there is a general pattern of variability in the number of awards obtained per student disregarding of the number of semi-finals competitions.
Awards and centrality of ballet schools.A shows the fat-tailed distribution of total awards per school, A k , indicating that from those schools obtaining awards, most of the have only one, and only a few have more than more than 100 awards.B shows the log distribution of awards A k respect to the number of students T k per school k, i.e. ratio of awards, R k .Color indicates the corresponding group by ratio of awards and their percentage (plot not displaying schools with 'no medals', A k = 0).Only 56.8% of schools obtained at least one award, and the total number awards increases proportionally to the number of students.However, the ratio of 'high-achievers' is independent from their number of top students.C shows the relationship of betweenness centrality, B k , with each group by ratio of awards.Schools with no medals (in gray) have the lowest centrality value, while only highly central schools are in the 'high-achievers' group.centrality, B k .Schools with no medals (in gray) have the lowest centrality values, closer to zero.On the contrary, only highly central schools are in the 'high-achievers' group, yet this group also includes low centrality schools.Moreover, 'under-achiever' schools have a wider distribution across the lower quartile, indicating that these schools can have higher centrality than the other groups.

Table S2 :
Model Coe cients for the Probability of Success.Model coe cients labeled by pvalue.Standard errors in parentheses.